Auditable Versioned Data Storage Outsourcing

نویسندگان

  • Ertem Esiner
  • Anwitaman Datta
چکیده

Auditability is crucial for data outsourcing, facilitating accountability and identifying data loss or corruption incidents in a timely manner, reducing in turn the risks from such losses. In recent years, in synch with the growing trend of outsourcing, a lot of progress has been made in designing probabilistic (for efficiency) provable data possession (PDP) schemes. However, even the recent and advanced PDP solutions that do deal with dynamic data, do so in a limited manner, and for only the latest version of the data. A naive solution treating different versions in isolation would work, but leads to tremendous overheads, and is undesirable. In this paper, we present algorithms to achieve full persistence (all intermediate configurations are preserved and are modifiable) for an optimized skip list (known as FlexList) so that versioned data can be audited. The proposed scheme provides deduplication at the level of logical, variable sized blocks, such that only the altered parts of the different versions are kept, while the persistent data-structure facilitates access (read) of any arbitrary version with the same storage and process efficiency that state-of-the-art dynamic PDP solutions provide for only the current version, while commit (write) operations incur around 5% additional time. Furthermore, the time overhead for auditing arbitrary versions in addition to the latest version is imperceptible even on a low-end server. Additionally, the application of our approach opens up the possibility to naturally support block level deduplication. While a naive solution to audit versions would copy the whole data and the data structure for each version, our solution utilises storage space amounting very close to the Email addresses: [email protected] (Ertem Esiner), [email protected] (Anwitaman Datta) Preprint submitted to Future Generation Computer Systems August 3, 2015 ar X iv :1 50 7. 08 83 8v 1 [ cs .C R ] 3 1 Ju l 2 01 5 most efficient delta-based solutions. Accordingly, we explore how the proposed data structure benefits the system with block level deduplication besides adding auditability property, and how it can be integrated with a state-of-the-art versioning system (Git), and in the process scale the storage efficiency of Git, and thus help scale the size of data to be stored in Git, without compromising the retrieval efficiency of arbitrary versions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-versioned Data Storage and Iterative Processing in a Parallel Array Database Engine

Multi-versioned Data Storage and Iterative Processing in a Parallel Array Database Engine

متن کامل

Optimal query/update tradeoffs in versioned dictionaries

External-memory dictionaries are a fundamental data structure in file systems and databases. Versioned (or fullypersistent) dictionaries have an associated version tree where queries can be performed at any version, updates can be performed on leaf versions, and any version can be ‘cloned’ by adding a child. Various query/update tradeoffs are known for unversioned dictionaries, many of them wit...

متن کامل

Compressed Differential Erasure Codes for Efficient Archival of Versioned Data

In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner in distributed storage systems. We propose a new storage technique called differential erasure coding (DEC) where the differences (deltas) between subsequent versions are stored rather than the whole objects, akin to a typical delta encoding technique. However, unlike delta encoding te...

متن کامل

Nonblocking Distributed Replication of Versioned Files

In this paper, we propose a distributed data storage framework that supports unrestricted offline access. The system does not explicitly distinguish between connected and disconnected states. Its design is based on a lock-free distributed framework that avoids update conflicts through file versioning. We propose an algorithm for replica synchronization. The feasibility of this framework is conf...

متن کامل

Decibel: The Relational Dataset Branching System

As scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Future Generation Comp. Syst.

دوره 55  شماره 

صفحات  -

تاریخ انتشار 2016